Pivot, Box and Trilingual: Lexicon Extraction for Low-Resource Language Pairs with Extended Topic Models
نویسندگان
چکیده
Data-driven approaches to natural language processing have been shown to be greatly effective, and the case of bilingual lexicon extraction is no exception. While training data is readily available for many language pairs, many existing approaches fail for languages for which there simply does not exist parallel data. While there have been many studies on bilingual lexicon extraction, there has been little focus on the important problem of accommodating low-resource language pairs. We present a variety of solutions to this problem, demonstrating their application to a practical scenario, and compare their effectiveness to mainstream approaches. In this paper we develop pivot-based approaches for bilingual lexicon extraction using the framework of topic modelling [1]. Topic modelling has been a popular approach for bilingual lexicon extraction, however its use as a pivot model has yet to be explored.
منابع مشابه
Pivot-Based Topic Models for Low-Resource Lexicon Extraction
This paper proposes a range of solutions to the challenges of extracting large and highquality bilingual lexicons for low-resource language pairs. In such scenarios there is often no parallel or even comparable data available. We design three effective pivotbased approaches inspired by the state-ofthe-art technique of bilingual topic modelling, extending previous work to take advantage of trili...
متن کاملThe Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction
In this paper, we present a trilingual parallel corpus for German, Italian and Romansh, a Swiss minority language spoken in the canton of Grisons. The corpus called ALLEGRA contains press releases automatically gathered from the website of the cantonal administration of Grisons. Texts have been preprocessed and aligned with a current state-of-the-art sentence aligner. The corpus is one of the f...
متن کاملEvaluating a Pivot-Based Approach for Bilingual Lexicon Extraction
A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...
متن کاملConstraint-Based Bilingual Lexicon Induction for Closely Related Languages
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose...
متن کاملBilingual Lexicon Extraction via Pivot Language and Word Alignment Tool
This paper presents a simple and effective method for automatic bilingual lexicon extraction from less-known language pairs. To do this, we bring in a bridge language named the pivot language and adopt information retrieval techniques combined with natural language processing techniques. Moreover, we use a freely available word aligner: Anymalign (Lardilleux et al., 2011) for constructing conte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014